Overview
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 428 | 435 |
| Missing cells (%) | 8.0% | 8.1% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Fare is highly overall correlated with Pclass | Alert not present in this dataset | High correlation |
Pclass is highly overall correlated with Fare | Alert not present in this dataset | High correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Age has 85 (19.1%) missing values | Age has 92 (20.6%) missing values | Missing |
Cabin has 342 (76.7%) missing values | Cabin has 343 (76.9%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 307 (68.8%) zeros | SibSp has 311 (69.7%) zeros | Zeros |
Parch has 334 (74.9%) zeros | Parch has 353 (79.1%) zeros | Zeros |
Fare has 6 (1.3%) zeros | Fare has 6 (1.3%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-11-21 05:12:36.223156 | 2025-11-21 05:12:38.091184 |
| Analysis finished | 2025-11-21 05:12:38.088604 | 2025-11-21 05:12:39.960627 |
| Duration | 1.87 second | 1.87 second |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 441.1861 | 429.84978 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 891 | 889 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 30.5 | 47.25 |
| Q1 | 222.25 | 211.25 |
| median | 441.5 | 405.5 |
| Q3 | 662.75 | 646.75 |
| 95-th percentile | 844.75 | 832.5 |
| Maximum | 891 | 889 |
| Range | 890 | 888 |
| Interquartile range (IQR) | 440.5 | 435.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 261.69006 | 253.2678 |
| Coefficient of variation (CV) | 0.5931512 | 0.58920072 |
| Kurtosis | -1.2158191 | -1.1896159 |
| Mean | 441.1861 | 429.84978 |
| Median Absolute Deviation (MAD) | 221 | 215.5 |
| Skewness | -0.010241622 | 0.057135374 |
| Sum | 196769 | 191713 |
| Variance | 68481.689 | 64144.577 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 232 | 1 | 0.2% |
| 379 | 1 | 0.2% |
| 120 | 1 | 0.2% |
| 885 | 1 | 0.2% |
| 485 | 1 | 0.2% |
| 293 | 1 | 0.2% |
| 74 | 1 | 0.2% |
| 725 | 1 | 0.2% |
| 338 | 1 | 0.2% |
| 334 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 654 | 1 | 0.2% |
| 527 | 1 | 0.2% |
| 570 | 1 | 0.2% |
| 72 | 1 | 0.2% |
| 377 | 1 | 0.2% |
| 294 | 1 | 0.2% |
| 121 | 1 | 0.2% |
| 73 | 1 | 0.2% |
| 137 | 1 | 0.2% |
| 553 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 0 | 1 |
| 3rd row | 0 | 1 |
| 4th row | 0 | 0 |
| 5th row | 1 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 290 | |
| 1 | 156 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 290 | |
| 1 | 156 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 290 | |
| 1 | 156 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 290 | |
| 1 | 156 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 290 | |
| 1 | 156 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 290 | |
| 1 | 156 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 3 | 2 |
| 3rd row | 3 | 3 |
| 4th row | 3 | 3 |
| 5th row | 1 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 108 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 67 |
| Median length | 51 | 49 |
| Mean length | 26.827354 | 26.630045 |
| Min length | 12 | 13 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Larsson, Mr. Bengt Edvin | O'Leary, Miss. Hanora "Norah" |
| 2nd row | Betros, Mr. Tannous | Ridsdale, Miss. Lucy |
| 3rd row | Andersson, Miss. Ellis Anna Maria | Jonsson, Mr. Carl |
| 4th row | Sutehall, Mr. Henry Jr | Goodwin, Miss. Lillian Amy |
| 5th row | Bishop, Mr. Dickinson H | Landergren, Miss. Aurora Adelia |
| Value | Count | Frequency (%) |
| mr | 259 | 14.2% |
| miss | 91 | 5.0% |
| mrs | 68 | 3.7% |
| john | 25 | 1.4% |
| william | 24 | 1.3% |
| master | 22 | 1.2% |
| henry | 20 | 1.1% |
| george | 12 | 0.7% |
| edward | 11 | 0.6% |
| james | 11 | 0.6% |
| Other values (866) | 1277 |
| Value | Count | Frequency (%) |
| mr | 270 | 14.9% |
| miss | 89 | 4.9% |
| mrs | 65 | 3.6% |
| william | 30 | 1.7% |
| henry | 23 | 1.3% |
| john | 23 | 1.3% |
| master | 15 | 0.8% |
| thomas | 13 | 0.7% |
| james | 12 | 0.7% |
| charles | 11 | 0.6% |
| Other values (887) | 1257 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1376 | 11.5% | |
| r | 981 | 8.2% |
| e | 866 | 7.2% |
| a | 835 | 7.0% |
| s | 657 | 5.5% |
| n | 655 | 5.5% |
| i | 617 | 5.2% |
| M | 555 | 4.6% |
| l | 513 | 4.3% |
| o | 495 | 4.1% |
| Other values (50) | 4415 |
| Value | Count | Frequency (%) |
| 1363 | 11.5% | |
| r | 967 | 8.1% |
| e | 847 | 7.1% |
| a | 835 | 7.0% |
| i | 652 | 5.5% |
| s | 647 | 5.4% |
| n | 645 | 5.4% |
| M | 567 | 4.8% |
| l | 547 | 4.6% |
| o | 474 | 4.0% |
| Other values (49) | 4333 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11965 |
| Value | Count | Frequency (%) |
| (unknown) | 11877 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1376 | 11.5% | |
| r | 981 | 8.2% |
| e | 866 | 7.2% |
| a | 835 | 7.0% |
| s | 657 | 5.5% |
| n | 655 | 5.5% |
| i | 617 | 5.2% |
| M | 555 | 4.6% |
| l | 513 | 4.3% |
| o | 495 | 4.1% |
| Other values (50) | 4415 |
| Value | Count | Frequency (%) |
| 1363 | 11.5% | |
| r | 967 | 8.1% |
| e | 847 | 7.1% |
| a | 835 | 7.0% |
| i | 652 | 5.5% |
| s | 647 | 5.4% |
| n | 645 | 5.4% |
| M | 567 | 4.8% |
| l | 547 | 4.6% |
| o | 474 | 4.0% |
| Other values (49) | 4333 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11965 |
| Value | Count | Frequency (%) |
| (unknown) | 11877 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1376 | 11.5% | |
| r | 981 | 8.2% |
| e | 866 | 7.2% |
| a | 835 | 7.0% |
| s | 657 | 5.5% |
| n | 655 | 5.5% |
| i | 617 | 5.2% |
| M | 555 | 4.6% |
| l | 513 | 4.3% |
| o | 495 | 4.1% |
| Other values (50) | 4415 |
| Value | Count | Frequency (%) |
| 1363 | 11.5% | |
| r | 967 | 8.1% |
| e | 847 | 7.1% |
| a | 835 | 7.0% |
| i | 652 | 5.5% |
| s | 647 | 5.4% |
| n | 645 | 5.4% |
| M | 567 | 4.8% |
| l | 547 | 4.6% |
| o | 474 | 4.0% |
| Other values (49) | 4333 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11965 |
| Value | Count | Frequency (%) |
| (unknown) | 11877 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1376 | 11.5% | |
| r | 981 | 8.2% |
| e | 866 | 7.2% |
| a | 835 | 7.0% |
| s | 657 | 5.5% |
| n | 655 | 5.5% |
| i | 617 | 5.2% |
| M | 555 | 4.6% |
| l | 513 | 4.3% |
| o | 495 | 4.1% |
| Other values (50) | 4415 |
| Value | Count | Frequency (%) |
| 1363 | 11.5% | |
| r | 967 | 8.1% |
| e | 847 | 7.1% |
| a | 835 | 7.0% |
| i | 652 | 5.5% |
| s | 647 | 5.4% |
| n | 645 | 5.4% |
| M | 567 | 4.8% |
| l | 547 | 4.6% |
| o | 474 | 4.0% |
| Other values (49) | 4333 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7174888 | 4.6860987 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | female |
| 2nd row | male | female |
| 3rd row | female | male |
| 4th row | male | female |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 286 | |
| female | 160 |
| Value | Count | Frequency (%) |
| male | 293 | |
| female | 153 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 286 | |
| female | 160 |
| Value | Count | Frequency (%) |
| male | 293 | |
| female | 153 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2104 |
| Value | Count | Frequency (%) |
| (unknown) | 2090 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2104 |
| Value | Count | Frequency (%) |
| (unknown) | 2090 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2104 |
| Value | Count | Frequency (%) |
| (unknown) | 2090 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 78 | 72 |
| Distinct (%) | 21.6% | 20.3% |
| Missing | 85 | 92 |
| Missing (%) | 19.1% | 20.6% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.090277 | 30.15678 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 1 |
| Maximum | 74 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 1 |
| 5-th percentile | 3 | 6 |
| Q1 | 20 | 21 |
| median | 28 | 29 |
| Q3 | 38 | 38 |
| 95-th percentile | 54 | 54 |
| Maximum | 74 | 80 |
| Range | 73.58 | 79 |
| Interquartile range (IQR) | 18 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.300064 | 13.737735 |
| Coefficient of variation (CV) | 0.49157539 | 0.45554382 |
| Kurtosis | 0.093756131 | 0.3903608 |
| Mean | 29.090277 | 30.15678 |
| Median Absolute Deviation (MAD) | 8.5 | 8 |
| Skewness | 0.25630784 | 0.43052165 |
| Sum | 10501.59 | 10675.5 |
| Variance | 204.49184 | 188.72535 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 17 | 3.8% |
| 30 | 16 | 3.6% |
| 21 | 13 | 2.9% |
| 22 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 36 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 28 | 12 | 2.7% |
| 35 | 12 | 2.7% |
| 19 | 12 | 2.7% |
| Other values (68) | 229 | |
| (Missing) | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 19 | 17 | 3.8% |
| 22 | 15 | 3.4% |
| 21 | 15 | 3.4% |
| 32 | 13 | 2.9% |
| 18 | 13 | 2.9% |
| 24 | 11 | 2.5% |
| 31 | 11 | 2.5% |
| 28 | 11 | 2.5% |
| 26 | 11 | 2.5% |
| 29 | 11 | 2.5% |
| Other values (62) | 226 | |
| (Missing) | 92 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 7 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 3 |
| Value | Count | Frequency (%) |
| 1 | 3 | |
| 2 | 4 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 3 | |
| 6 | 2 | |
| 7 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| 9 | 3 | |
| 11 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 1 | 3 | |
| 2 | 4 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 3 | |
| 6 | 2 | |
| 7 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| 9 | 3 | |
| 11 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 7 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 3 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.5044843 | 0.49103139 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 307 | 311 |
| Zeros (%) | 68.8% | 69.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0969734 | 1.0509109 |
| Coefficient of variation (CV) | 2.174445 | 2.1402113 |
| Kurtosis | 20.520514 | 19.237256 |
| Mean | 0.5044843 | 0.49103139 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.9694846 | 3.769624 |
| Sum | 225 | 219 |
| Variance | 1.2033506 | 1.1044138 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 105 | 23.5% |
| 2 | 14 | 3.1% |
| 4 | 8 | 1.8% |
| 3 | 6 | 1.3% |
| 8 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 99 | 22.2% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 7 | 1.6% |
| 8 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 105 | 23.5% |
| 2 | 14 | 3.1% |
| 3 | 6 | 1.3% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 99 | 22.2% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 99 | 22.2% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 105 | 23.5% |
| 2 | 14 | 3.1% |
| 3 | 6 | 1.3% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39237668 | 0.35650224 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 334 | 353 |
| Zeros (%) | 74.9% | 79.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0.75 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 0.75 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.83509278 | 0.85369803 |
| Coefficient of variation (CV) | 2.1282936 | 2.3946498 |
| Kurtosis | 12.456563 | 13.053339 |
| Mean | 0.39237668 | 0.35650224 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.0649883 | 3.2723044 |
| Sum | 175 | 159 |
| Variance | 0.69737996 | 0.72880032 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 69 | 15.5% |
| 2 | 35 | 7.8% |
| 5 | 4 | 0.9% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 353 | |
| 1 | 51 | 11.4% |
| 2 | 32 | 7.2% |
| 5 | 4 | 0.9% |
| 4 | 3 | 0.7% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 69 | 15.5% |
| 2 | 35 | 7.8% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 4 | 0.9% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 353 | |
| 1 | 51 | 11.4% |
| 2 | 32 | 7.2% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 4 | 0.9% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 353 | |
| 1 | 51 | 11.4% |
| 2 | 32 | 7.2% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 4 | 0.9% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 69 | 15.5% |
| 2 | 35 | 7.8% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 4 | 0.9% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 375 | 385 |
| Distinct (%) | 84.1% | 86.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6457399 | 6.8766816 |
| Min length | 3 | 3 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 322 | 340 ? |
| Unique (%) | 72.2% | 76.2% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 347067 | 330919 |
| 2nd row | 2648 | W./C. 14258 |
| 3rd row | 347082 | 350417 |
| 4th row | SOTON/OQ 392076 | CA 2144 |
| 5th row | 11967 | C 7077 |
| Value | Count | Frequency (%) |
| pc | 32 | 5.7% |
| c.a | 12 | 2.1% |
| ca | 8 | 1.4% |
| 2 | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| a/5 | 6 | 1.1% |
| sc/paris | 5 | 0.9% |
| 1601 | 5 | 0.9% |
| soton/oq | 4 | 0.7% |
| a/4 | 4 | 0.7% |
| Other values (394) | 471 |
| Value | Count | Frequency (%) |
| pc | 33 | 5.7% |
| a/5 | 10 | 1.7% |
| c.a | 10 | 1.7% |
| 2 | 9 | 1.6% |
| ston/o | 9 | 1.6% |
| ca | 7 | 1.2% |
| soton/o.q | 6 | 1.0% |
| w./c | 5 | 0.9% |
| 347088 | 5 | 0.9% |
| a/4 | 4 | 0.7% |
| Other values (408) | 480 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 371 | |
| 1 | 339 | |
| 2 | 295 | |
| 7 | 250 | |
| 4 | 232 | |
| 6 | 217 | 7.3% |
| 0 | 203 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 159 | 5.4% |
| 8 | 133 | 4.5% |
| Other values (22) | 572 |
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 357 | |
| 2 | 295 | |
| 7 | 258 | 8.4% |
| 4 | 227 | 7.4% |
| 0 | 214 | 7.0% |
| 6 | 197 | 6.4% |
| 5 | 187 | 6.1% |
| 9 | 163 | 5.3% |
| 8 | 147 | 4.8% |
| Other values (25) | 650 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2964 |
| Value | Count | Frequency (%) |
| (unknown) | 3067 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 371 | |
| 1 | 339 | |
| 2 | 295 | |
| 7 | 250 | |
| 4 | 232 | |
| 6 | 217 | 7.3% |
| 0 | 203 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 159 | 5.4% |
| 8 | 133 | 4.5% |
| Other values (22) | 572 |
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 357 | |
| 2 | 295 | |
| 7 | 258 | 8.4% |
| 4 | 227 | 7.4% |
| 0 | 214 | 7.0% |
| 6 | 197 | 6.4% |
| 5 | 187 | 6.1% |
| 9 | 163 | 5.3% |
| 8 | 147 | 4.8% |
| Other values (25) | 650 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2964 |
| Value | Count | Frequency (%) |
| (unknown) | 3067 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 371 | |
| 1 | 339 | |
| 2 | 295 | |
| 7 | 250 | |
| 4 | 232 | |
| 6 | 217 | 7.3% |
| 0 | 203 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 159 | 5.4% |
| 8 | 133 | 4.5% |
| Other values (22) | 572 |
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 357 | |
| 2 | 295 | |
| 7 | 258 | 8.4% |
| 4 | 227 | 7.4% |
| 0 | 214 | 7.0% |
| 6 | 197 | 6.4% |
| 5 | 187 | 6.1% |
| 9 | 163 | 5.3% |
| 8 | 147 | 4.8% |
| Other values (25) | 650 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2964 |
| Value | Count | Frequency (%) |
| (unknown) | 3067 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 371 | |
| 1 | 339 | |
| 2 | 295 | |
| 7 | 250 | |
| 4 | 232 | |
| 6 | 217 | 7.3% |
| 0 | 203 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 159 | 5.4% |
| 8 | 133 | 4.5% |
| Other values (22) | 572 |
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 357 | |
| 2 | 295 | |
| 7 | 258 | 8.4% |
| 4 | 227 | 7.4% |
| 0 | 214 | 7.0% |
| 6 | 197 | 6.4% |
| 5 | 187 | 6.1% |
| 9 | 163 | 5.3% |
| 8 | 147 | 4.8% |
| Other values (25) | 650 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 188 | 182 |
| Distinct (%) | 42.2% | 40.8% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.000204 | 32.643834 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 263 | 512.3292 |
| Zeros | 6 | 6 |
| Zeros (%) | 1.3% | 1.3% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.8958 | 7.8958 |
| median | 14.75 | 14.4542 |
| Q3 | 32.875 | 30.5 |
| 95-th percentile | 103.19375 | 130.2375 |
| Maximum | 263 | 512.3292 |
| Range | 263 | 512.3292 |
| Interquartile range (IQR) | 24.9792 | 22.6042 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 36.804407 | 52.129992 |
| Coefficient of variation (CV) | 1.2268052 | 1.5969323 |
| Kurtosis | 11.655528 | 34.581323 |
| Mean | 30.000204 | 32.643834 |
| Median Absolute Deviation (MAD) | 7.5 | 7.17295 |
| Skewness | 2.9820073 | 4.9188093 |
| Sum | 13380.091 | 14559.15 |
| Variance | 1354.5644 | 2717.536 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 20 | 4.5% |
| 8.05 | 20 | 4.5% |
| 26 | 18 | 4.0% |
| 7.75 | 16 | 3.6% |
| 7.8958 | 15 | 3.4% |
| 7.2292 | 11 | 2.5% |
| 7.775 | 10 | 2.2% |
| 10.5 | 9 | 2.0% |
| 7.925 | 8 | 1.8% |
| 7.225 | 8 | 1.8% |
| Other values (178) | 311 |
| Value | Count | Frequency (%) |
| 8.05 | 24 | 5.4% |
| 7.75 | 20 | 4.5% |
| 7.8958 | 16 | 3.6% |
| 26 | 16 | 3.6% |
| 13 | 16 | 3.6% |
| 10.5 | 13 | 2.9% |
| 7.925 | 10 | 2.2% |
| 7.775 | 10 | 2.2% |
| 7.25 | 9 | 2.0% |
| 7.8542 | 9 | 2.0% |
| Other values (172) | 303 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 88 | 89 |
| Distinct (%) | 84.6% | 86.4% |
| Missing | 342 | 343 |
| Missing (%) | 76.7% | 76.9% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 11 |
| Median length | 3 | 3 |
| Mean length | 3.4519231 | 3.3300971 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 73 | 78 ? |
| Unique (%) | 70.2% | 75.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | B49 | D47 |
| 2nd row | D | C106 |
| 3rd row | E8 | E58 |
| 4th row | E40 | C2 |
| 5th row | C70 | D20 |
| Value | Count | Frequency (%) |
| g6 | 3 | 2.6% |
| b49 | 2 | 1.7% |
| b18 | 2 | 1.7% |
| d33 | 2 | 1.7% |
| b96 | 2 | 1.7% |
| b98 | 2 | 1.7% |
| f4 | 2 | 1.7% |
| b35 | 2 | 1.7% |
| f2 | 2 | 1.7% |
| d26 | 2 | 1.7% |
| Other values (89) | 96 |
| Value | Count | Frequency (%) |
| f2 | 3 | 2.6% |
| d | 3 | 2.6% |
| f33 | 3 | 2.6% |
| g6 | 2 | 1.8% |
| d26 | 2 | 1.8% |
| b51 | 2 | 1.8% |
| b53 | 2 | 1.8% |
| b55 | 2 | 1.8% |
| d20 | 2 | 1.8% |
| b49 | 2 | 1.8% |
| Other values (88) | 91 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 35 | 9.7% |
| 2 | 35 | 9.7% |
| 1 | 33 | 9.2% |
| C | 33 | 9.2% |
| 3 | 30 | 8.4% |
| 6 | 26 | 7.2% |
| 8 | 22 | 6.1% |
| 5 | 20 | 5.6% |
| 9 | 19 | 5.3% |
| 0 | 18 | 5.0% |
| Other values (9) | 88 |
| Value | Count | Frequency (%) |
| C | 36 | 10.5% |
| 3 | 35 | 10.2% |
| 1 | 30 | 8.7% |
| B | 30 | 8.7% |
| 2 | 29 | 8.5% |
| 5 | 21 | 6.1% |
| 8 | 19 | 5.5% |
| 0 | 18 | 5.2% |
| D | 17 | 5.0% |
| 7 | 17 | 5.0% |
| Other values (9) | 91 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 359 |
| Value | Count | Frequency (%) |
| (unknown) | 343 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| B | 35 | 9.7% |
| 2 | 35 | 9.7% |
| 1 | 33 | 9.2% |
| C | 33 | 9.2% |
| 3 | 30 | 8.4% |
| 6 | 26 | 7.2% |
| 8 | 22 | 6.1% |
| 5 | 20 | 5.6% |
| 9 | 19 | 5.3% |
| 0 | 18 | 5.0% |
| Other values (9) | 88 |
| Value | Count | Frequency (%) |
| C | 36 | 10.5% |
| 3 | 35 | 10.2% |
| 1 | 30 | 8.7% |
| B | 30 | 8.7% |
| 2 | 29 | 8.5% |
| 5 | 21 | 6.1% |
| 8 | 19 | 5.5% |
| 0 | 18 | 5.2% |
| D | 17 | 5.0% |
| 7 | 17 | 5.0% |
| Other values (9) | 91 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 359 |
| Value | Count | Frequency (%) |
| (unknown) | 343 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| B | 35 | 9.7% |
| 2 | 35 | 9.7% |
| 1 | 33 | 9.2% |
| C | 33 | 9.2% |
| 3 | 30 | 8.4% |
| 6 | 26 | 7.2% |
| 8 | 22 | 6.1% |
| 5 | 20 | 5.6% |
| 9 | 19 | 5.3% |
| 0 | 18 | 5.0% |
| Other values (9) | 88 |
| Value | Count | Frequency (%) |
| C | 36 | 10.5% |
| 3 | 35 | 10.2% |
| 1 | 30 | 8.7% |
| B | 30 | 8.7% |
| 2 | 29 | 8.5% |
| 5 | 21 | 6.1% |
| 8 | 19 | 5.5% |
| 0 | 18 | 5.2% |
| D | 17 | 5.0% |
| 7 | 17 | 5.0% |
| Other values (9) | 91 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 359 |
| Value | Count | Frequency (%) |
| (unknown) | 343 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| B | 35 | 9.7% |
| 2 | 35 | 9.7% |
| 1 | 33 | 9.2% |
| C | 33 | 9.2% |
| 3 | 30 | 8.4% |
| 6 | 26 | 7.2% |
| 8 | 22 | 6.1% |
| 5 | 20 | 5.6% |
| 9 | 19 | 5.3% |
| 0 | 18 | 5.0% |
| Other values (9) | 88 |
| Value | Count | Frequency (%) |
| C | 36 | 10.5% |
| 3 | 35 | 10.2% |
| 1 | 30 | 8.7% |
| B | 30 | 8.7% |
| 2 | 29 | 8.5% |
| 5 | 21 | 6.1% |
| 8 | 19 | 5.5% |
| 0 | 18 | 5.2% |
| D | 17 | 5.0% |
| 7 | 17 | 5.0% |
| Other values (9) | 91 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 0 |
| Missing (%) | 0.2% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | Q |
| 2nd row | C | S |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | C | S |
Common Values
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 93 | 20.9% |
| Q | 39 | 8.7% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 90 | 20.2% |
| Q | 39 | 8.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 313 | |
| c | 93 | 20.9% |
| q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| s | 317 | |
| c | 90 | 20.2% |
| q | 39 | 8.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 93 | 20.9% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 90 | 20.2% |
| Q | 39 | 8.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 93 | 20.9% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 90 | 20.2% |
| Q | 39 | 8.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 93 | 20.9% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 90 | 20.2% |
| Q | 39 | 8.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 93 | 20.9% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 90 | 20.2% |
| Q | 39 | 8.7% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.097 | -0.233 | 0.097 | 0.244 | 0.035 | -0.178 | 0.156 |
| Embarked | 0.000 | 1.000 | 0.238 | 0.000 | 0.000 | 0.271 | 0.129 | 0.038 | 0.134 |
| Fare | 0.097 | 0.238 | 1.000 | 0.383 | -0.022 | 0.542 | 0.185 | 0.423 | 0.272 |
| Parch | -0.233 | 0.000 | 0.383 | 1.000 | -0.001 | 0.060 | 0.208 | 0.482 | 0.154 |
| PassengerId | 0.097 | 0.000 | -0.022 | -0.001 | 1.000 | 0.100 | 0.000 | -0.060 | 0.059 |
| Pclass | 0.244 | 0.271 | 0.542 | 0.060 | 0.100 | 1.000 | 0.135 | 0.103 | 0.279 |
| Sex | 0.035 | 0.129 | 0.185 | 0.208 | 0.000 | 0.135 | 1.000 | 0.197 | 0.583 |
| SibSp | -0.178 | 0.038 | 0.423 | 0.482 | -0.060 | 0.103 | 0.197 | 1.000 | 0.218 |
| Survived | 0.156 | 0.134 | 0.272 | 0.154 | 0.059 | 0.279 | 0.583 | 0.218 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.204 | -0.174 | 0.030 | 0.285 | 0.000 | -0.123 | 0.079 |
| Embarked | 0.000 | 1.000 | 0.192 | 0.032 | 0.051 | 0.294 | 0.151 | 0.040 | 0.214 |
| Fare | 0.204 | 0.192 | 1.000 | 0.392 | -0.013 | 0.479 | 0.135 | 0.417 | 0.267 |
| Parch | -0.174 | 0.032 | 0.392 | 1.000 | 0.032 | 0.000 | 0.185 | 0.400 | 0.000 |
| PassengerId | 0.030 | 0.051 | -0.013 | 0.032 | 1.000 | 0.000 | 0.122 | -0.028 | 0.084 |
| Pclass | 0.285 | 0.294 | 0.479 | 0.000 | 0.000 | 1.000 | 0.094 | 0.103 | 0.320 |
| Sex | 0.000 | 0.151 | 0.135 | 0.185 | 0.122 | 0.094 | 1.000 | 0.150 | 0.543 |
| SibSp | -0.123 | 0.040 | 0.417 | 0.400 | -0.028 | 0.103 | 0.150 | 1.000 | 0.092 |
| Survived | 0.079 | 0.214 | 0.267 | 0.000 | 0.084 | 0.320 | 0.543 | 0.092 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 231 | 232 | 0 | 3 | Larsson, Mr. Bengt Edvin | male | 29.0 | 0 | 0 | 347067 | 7.7750 | NaN | S |
| 378 | 379 | 0 | 3 | Betros, Mr. Tannous | male | 20.0 | 0 | 0 | 2648 | 4.0125 | NaN | C |
| 119 | 120 | 0 | 3 | Andersson, Miss. Ellis Anna Maria | female | 2.0 | 4 | 2 | 347082 | 31.2750 | NaN | S |
| 884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S |
| 484 | 485 | 1 | 1 | Bishop, Mr. Dickinson H | male | 25.0 | 1 | 0 | 11967 | 91.0792 | B49 | C |
| 292 | 293 | 0 | 2 | Levy, Mr. Rene Jacques | male | 36.0 | 0 | 0 | SC/Paris 2163 | 12.8750 | D | C |
| 73 | 74 | 0 | 3 | Chronopoulos, Mr. Apostolos | male | 26.0 | 1 | 0 | 2680 | 14.4542 | NaN | C |
| 724 | 725 | 1 | 1 | Chambers, Mr. Norman Campbell | male | 27.0 | 1 | 0 | 113806 | 53.1000 | E8 | S |
| 337 | 338 | 1 | 1 | Burns, Miss. Elizabeth Margaret | female | 41.0 | 0 | 0 | 16966 | 134.5000 | E40 | C |
| 333 | 334 | 0 | 3 | Vander Planke, Mr. Leo Edmondus | male | 16.0 | 2 | 0 | 345764 | 18.0000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 653 | 654 | 1 | 3 | O'Leary, Miss. Hanora "Norah" | female | NaN | 0 | 0 | 330919 | 7.8292 | NaN | Q |
| 526 | 527 | 1 | 2 | Ridsdale, Miss. Lucy | female | 50.0 | 0 | 0 | W./C. 14258 | 10.5000 | NaN | S |
| 569 | 570 | 1 | 3 | Jonsson, Mr. Carl | male | 32.0 | 0 | 0 | 350417 | 7.8542 | NaN | S |
| 71 | 72 | 0 | 3 | Goodwin, Miss. Lillian Amy | female | 16.0 | 5 | 2 | CA 2144 | 46.9000 | NaN | S |
| 376 | 377 | 1 | 3 | Landergren, Miss. Aurora Adelia | female | 22.0 | 0 | 0 | C 7077 | 7.2500 | NaN | S |
| 293 | 294 | 0 | 3 | Haas, Miss. Aloisia | female | 24.0 | 0 | 0 | 349236 | 8.8500 | NaN | S |
| 120 | 121 | 0 | 2 | Hickman, Mr. Stanley George | male | 21.0 | 2 | 0 | S.O.C. 14879 | 73.5000 | NaN | S |
| 72 | 73 | 0 | 2 | Hood, Mr. Ambrose Jr | male | 21.0 | 0 | 0 | S.O.C. 14879 | 73.5000 | NaN | S |
| 136 | 137 | 1 | 1 | Newsom, Miss. Helen Monypeny | female | 19.0 | 0 | 2 | 11752 | 26.2833 | D47 | S |
| 552 | 553 | 0 | 3 | O'Brien, Mr. Timothy | male | NaN | 0 | 0 | 330979 | 7.8292 | NaN | Q |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 240 | 241 | 0 | 3 | Zabour, Miss. Thamine | female | NaN | 1 | 0 | 2665 | 14.4542 | NaN | C |
| 210 | 211 | 0 | 3 | Ali, Mr. Ahmed | male | 24.0 | 0 | 0 | SOTON/O.Q. 3101311 | 7.0500 | NaN | S |
| 698 | 699 | 0 | 1 | Thayer, Mr. John Borland | male | 49.0 | 1 | 1 | 17421 | 110.8833 | C68 | C |
| 797 | 798 | 1 | 3 | Osman, Mrs. Mara | female | 31.0 | 0 | 0 | 349244 | 8.6833 | NaN | S |
| 33 | 34 | 0 | 2 | Wheadon, Mr. Edward H | male | 66.0 | 0 | 0 | C.A. 24579 | 10.5000 | NaN | S |
| 50 | 51 | 0 | 3 | Panula, Master. Juha Niilo | male | 7.0 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 23 | 24 | 1 | 1 | Sloper, Mr. William Thompson | male | 28.0 | 0 | 0 | 113788 | 35.5000 | A6 | S |
| 255 | 256 | 1 | 3 | Touma, Mrs. Darwis (Hanne Youssef Razi) | female | 29.0 | 0 | 2 | 2650 | 15.2458 | NaN | C |
| 494 | 495 | 0 | 3 | Stanley, Mr. Edward Roland | male | 21.0 | 0 | 0 | A/4 45380 | 8.0500 | NaN | S |
| 283 | 284 | 1 | 3 | Dorking, Mr. Edward Arthur | male | 19.0 | 0 | 0 | A/5. 10482 | 8.0500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 489 | 490 | 1 | 3 | Coutts, Master. Eden Leslie "Neville" | male | 9.0 | 1 | 1 | C.A. 37671 | 15.9000 | NaN | S |
| 215 | 216 | 1 | 1 | Newell, Miss. Madeleine | female | 31.0 | 1 | 0 | 35273 | 113.2750 | D36 | C |
| 325 | 326 | 1 | 1 | Young, Miss. Marie Grice | female | 36.0 | 0 | 0 | PC 17760 | 135.6333 | C32 | C |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 213 | 214 | 0 | 2 | Givard, Mr. Hans Kristensen | male | 30.0 | 0 | 0 | 250646 | 13.0000 | NaN | S |
| 399 | 400 | 1 | 2 | Trout, Mrs. William H (Jessie L) | female | 28.0 | 0 | 0 | 240929 | 12.6500 | NaN | S |
| 340 | 341 | 1 | 2 | Navratil, Master. Edmond Roger | male | 2.0 | 1 | 1 | 230080 | 26.0000 | F2 | S |
| 621 | 622 | 1 | 1 | Kimball, Mr. Edwin Nelson Jr | male | 42.0 | 1 | 0 | 11753 | 52.5542 | D19 | S |
| 98 | 99 | 1 | 2 | Doling, Mrs. John T (Ada Julia Bone) | female | 34.0 | 0 | 1 | 231919 | 23.0000 | NaN | S |
| 539 | 540 | 1 | 1 | Frolicher, Miss. Hedwig Margaritha | female | 22.0 | 0 | 2 | 13568 | 49.5000 | B39 | C |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||